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04 ■ The hairpin completion is an operation on formal languages which is inspired by the hairpin 

formation in biochemistry. Hairpin formations occur naturally within DNA-computing. It has 
been known that the hairpin completion of a regular language is linear context-free, but not 
regular, in general. However, for some time it is was open whether the regularity of the hairpin 
completion of a regular language is is decidable. In 2009 this decidability problem has been 
solved positively in [5] by providing a polynomial time algorithm. In this paper we improve 
the complexity bound by showing that the decision problem is actually NL-complete. This 
complexity bound holds for both, the one-sided and the two-sided hairpin completions. 



Keywords: Automata and Formal Languages; Regular Languages, Finite Automata; NL- 
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1 Introduction 

The hairpin completion is a natural operation of formal languages which has been inspired by 
molecular phenomena in biology and which occurs naturally during DNA-computing. An in- 
tramolecular base pairing, known as a hairpin, is a pattern that can occur in single-stranded DNA 
and, more commonly, in RNA. Hairpin or hairpin-free structures have numerous applications to 
DNA computing and molecular genetics, see [51l51l71 [Tf]lHTj and the references within for a detailed 
discussion. For example, an instance of 3-Sat has been solved with a DNA-algorithm and one of 
the main concepts was to eliminate all molecules with a hairpin structure, see [15] . 

In this paper we study the hairpin completion from a purely formal language viewpoint. The 
hairpin completion of a formal language was first defined by Cheptea, Martin- Vide, and Mitrana 
in [2]; here we use a slightly more general definition which was introduced in [5], The hairpin 
completion and some related operations have been studied in a series of papers from language 
theoretic and algorithmic point of view, see e.g., [51 [T2HT7] . The formal operation of the hairpin 
completion on words is best explained in Figure Q] In that picture as in the rest of the paper we 
mean by putting a bar on a word (like a) to read it from right-to-left and in addition to replace 
a letter a with the (Watson- Crick) complement a. The hairpin completion of a regular language 
is linear context-free, but not regular, in general [2j. 

For some time it was not known whether regularity of the hairpin completion of a regular 
language is decidable. It was only in 2009 when we presented in [S] a decision algorithm. Actually, 
we proved a better result by providing a polynomial time algorithm with a (rough) runtime 
estimation of about 0(n 20 ). 
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Figure 1: Hairpin completion of a DNA-strand (or a word). 



In an extended abstract which appeared at the CIAA 2010 we presented a modified approach 
to solve the same problem [4] which led, in particular, to the following two new results: First, 
the time complexity of the new decision algorithm is in 0(n s ). Second, the decision problem is 
NLOGSPACE-complete, i.e., NL-complete. 

This paper is the journal version of [I] for the second result. We decided to focus on the space 
complexity since, in terms of complexity, NL-completcness yields a precise characterization and 
because the given page limit did not allow to include full proofs for all results of [4]. Moreover, 
our proofs are still rather technical and the focus on the NL-algorithm simplifies the presentation. 

We consider the one-sided and the two-sided hairpin completions simultaneously. It turns out 
that NL-completeness holds in both cases. 

The paper is organized as follows. In Section [5] we fix the notation used throughout. We give 
the formal definition of the hairpin completion %k{Li, L2) and we discuss our input model using 
appropriate deterministic automata. 

In Section [3] we state the main result (Theorem 13. ip and we give a full proof in the subsequent 
subsections. A main technical tool is the use of single-valued non-deterministic log-space trans- 
ductions, which might be not fairly standard. They are explained in Section [3.11 In Section 2] we 
give a short conclusion and we state some open problems. 



2 Preliminaries and Notation 

We assume the reader to be familiar with the basic concepts of formal language theory, automata 
theory, and complexity theory, as one can find in the text books [S1HB] . By NL we mean the 
complexity class NLOGSPACE, which contains the problems which can be decided by a non- 
deterministic Turing machine using O(logn) work space. Throughout we use the well-known 
result that NL is closed under complementation, see e.g. [18]. We also use the fact that if L can 
be reduced to L' via some single-valued non-deterministic log-space transduction and L' 6 NL, 
then we have L E NL, see pQ and Section IcTTl for more details. 

By E we denote a finite alphabet with at least two letters. The set of words over E is denoted 
E*; and the empty word is denoted by 1. Given a word w, we denote by \w\ its length and 
w(m) 6 S its m-th letter. If w = xyz for some x,y,z e E*, then x and z are called prefix and 
suffix of w, respectively. By a proper prefix x of w we mean a prefix such that x ^ w (but we 
allow x = 1). The prefix relation between words x and w is denoted by x < w and for proper 
prefixes by x < w. 

We assume that the alphabet E is equipped with an involution : E — > E. An involution for 
a set is a bijection such that a — a. We extend the involution to words ai ■ ■ ■ a n by ai • • ■ a n = 
0^ • • • oT where the a-j's are letters. This convention is like taking inverses in groups. For languages 
L C E* we denote by L the set 

L = {w \ w e L} . 
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Throughout the paper L±, L2 are two regular languages in E* and by k we mean a (small) constant. 
(In a biological setting k ~ 10 yields a reasonable value.) According to Figure Q] we define the 
hairpin completion HkiLi, L2) by 

Hk(L 1 ,L 2 ) = {70/357 I (7a/3a G Li V a/357 S L 2 ) A |a| = fc} . 

This definition is slightly more general than the original definition in 2, 16 . It allows us to treat 
the two-sided hairpin completion (L\ = L 2 ) and the one-sided hairpin completion (cither L\ = 
or L2 = 0) simultaneously. 

A regular language can be specified by a non-deterministic finite automaton (NFA) A = 
(Q, E, E,I, J-), where Q is the finite set of states, I C Q is the set of initial states, and J 7 C Q is 
the set of final states. The set E contains labeled edges (or arcs), it is a subset of Q x S x Q. For 
a word m£S* we write p —^4- g, if there is a path from state p to q which is labeled by the word 
u. Thus, the accepted language becomes 



L{A) = [u G E* 



3p <E 13q E J 7 : p S- g 



Later it will be crucial to use also paths which avoid final states. For this we introduce a 
special notation. First remove all arcs (p, a, q) where q G J- is a final state. Thus, final states do 
not have incoming arcs anymore. Let us write p ==> q, if there is a path from state p to q which 
is labeled by the word u in this new automaton after removing these arcs. Note that for such a 
path p ==> q we allow p G J-, but on the path we never enter any final state again. 

An NFA is called a deterministic finite automaton (DFA) , if it has exactly one initial state and 
for every state p G Q and every letter a G E there is exactly one arc (p, a, q) G E. In particular, 
in this paper a DFA is always complete. Thus, we can read every word to its end. We also write 
p ■ u = q, if p — q. This yields a (totally defined) function Q x E* -> Q. (It defines an action of 
E* on Q on the right.) 

In the following we use a DFA accepting L% as well as a DFA accepting L2, but the DFA for 
L2 has to work from right-to-left. Instead of introducing this concept we use a DFA (working as 
usual from left-to-right), which accepts L2. This automaton has the same number of states as 
(and is structurally isomorphic to) a DFA accepting the reversal language of L2 ■ 

As input we assume that the regular languages L\ and L2 are specified by DFAs Ai and A2 
with state set Qi, state qm G Qi as initial state, and T% C Qi as final states. By n we denote the 
input size 

n = |Qi| + |Q 2 |. 

We also need the usual product DFA with state space 

Q = {(PUP2) G Qi x Q 2 I 3w G S* : {p\,p2) = (<7oi • w, q 02 ■ w)} . 

The action is given by (pi,P2) ■ o, = (p± - a, P2 • a). As Q contains only reachable states, the 
size of Q might be smaller than \Qi \ • \ Qi\- In the following we work simultaneously in all three 
automata defined so far. Moreover, in Qi and Q2 we are going to work backwards. This leads to 
nondeterminism. 



3 Main result 

The purpose of this paper is to prove the following result: 

Theorem 3.1. The following problem is NL- complete: 

Input: Two DFAs Ai and A2 recognizing L\ and L2 with state sets Q\ and Q2 resp. such 
thatn = IQil + l&l- 

Question: Is "Hk{Li , L2) regular? 
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Since NL is included in P we obtain the following result from [5] as a corollary. 

Corollary 3.2. The problem whether the hairpin completion %fc(£i, L 2 ) is regular is decidable in 
polynomial time. 

We now turn to the proof of Theorem 13. II The NL-hardness is immediate: 

Lemma 3.3. The problem whether the hairpin completion TL k (L\, L 2 ) is regular is ISSh-hard, even 
for I,, 0. 

Proof. The well-known NL-complete Graph-Accessibility-Problem [18 can easily be reduced to 
the following problem for DFAs: 

Let £ = {a, a, 6, 6} be an alphabet with four letters. Decide for a given DFA, which accepts a 
language L C {6, 6} , whether or not L is empty. 

Now let L\ = a*La k . The hairpin completion 

l-ik{L u %) = {a l wa j | i > j > k A w € L} 
is regular if and only if L is empty (because L C {&, 5}*). □ 

The difficult part is to show that deciding regularity of TLk{Li, L 2 ) is in NL. This is subject 
of the rest of this section. 

3.1 Single- valued non-deterministic log-space transductions 

A single- valued non-deterministic log-space transduction is performed by a non-deterministic log- 
space Turing machine which may stop on every input w with some output r(w). Single-valued 
means that, in case that the machine stops on input w, the output is always the same, indepen- 
dently of non-deterministic moves during the computation. Thus, w i— > r(w) is a well-defined 
function from words to words. A single-valued non-deterministic log-space transduction is a re- 
duction from a language L to L 1 , if we have w G L <==^ r(w) G L' . 

The following lemma belongs to folklore. Its proof is exactly the same as for the standard case 
of deterministic log-space reductions 8 a and therefore omitted. 

Lemma 3.4. Let L' E NL and assume that there exists a single-valued non- deterministic log-space 
transduction from L to L' . Then we have L G NL, too. 

Due to Lemma 13.41 we are free to use several single- valued non-deterministic log-space trans- 
ductions in order to enrich the input. 

3.2 Bridges 

Let Qi,Q 2 be the state sets as fixed by Theorem 13.11 For every quadruple (pi,P2, <?i, qi) G 
Qi x Q2 x Qi x Qi we define a regular language B(pi,p 2 ,qi,q 2 ) as follows: 

B(pi,P2,qi,q2) = {/? G S* j pi ■ [3 = qi Ap 2 -]3 = q 2 } ■ 

We say that a quadruple (pi,p 2 , qi, q 2 ) is a bridge, if B{pi,p 2 ,qi,q 2 ) ^ 0. The idea behind 
this notation is that B(pi,p 2 ,qi,q 2 ) closes a gap between pairs {pi,p 2 ) and (q\,q 2 ). For a bridge 
(j>i,p 2 , qi, q 2 ) the words /3 G B(j>i,p 2 ,qi,q 2 ) correspond later exactly to the /3-part in Figure [T] 

Lemma 3.5. There is a single-valued non- deterministic log-space transduction which outputs the 
table of all bridges. 

Proof. Graph reachability and its complement are solvable in NL. Therefore we can decide for 
each quadruple (pi,p 2 , qi,q 2 ) G Qi x Q 2 x Qi x Q 2 if it is a bridge, and we can output (pi,p 2 , qi, q 2 ) 
in the affirmative case. □ 
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3.3 The NFA A 



Next, we construct an NFA, which is called simply A, and we explore properties of this NFA. The 
NFA A uses k + 1 levels (or layers) of a product automaton over Q x Qi x Q 2 C Q 1 x Q 2 x Qi x Q 2 
where Q has been defined as in Section [21 Hence, the number of states is at most (k + l)n 4 which 
is in C(n 4 ). 

Formally, we use a level for each £ with < £ < k, hence there are k + 1 levels. By [k] we 
denote in this paper the set {0, . . . , k}. Define 

Qa = {{{pi,P2),qi,q2,£) e Q x Qi x Q 2 x [k] | (pi,P2, qi, ga) is abridge} 

as the state space of an NFA called „4. 

We call a state ((pi,P2)j 91,92,^) a bridge at level £, and we keep in mind that there exists a 
word w such that p\-w — q\ and P2-w — q 2 . Frequently (and by a slight abuse of language) we call 
a state ((pi,p 2 ),qi,q2,£) simply a bridge, too. Bridges at level £ are also denoted by (P,qi,q2,£) 
with P = (pi,p 2 ) £ Q, qi £ Qi, i = 1, 2, and £ 6 [fc]. Bridges at different levels play a central role 
in the following. 

Let a € S. The a-transitions in the NFA are given by the following arcs: 

(P, (71 - a, q 2 • 5,0) (P • a, <ji,g 2 ,0) for q i -a£F l ,i = 1,2, 

(P, gi - a, q 2 • a, 0) — > (P - a, qi, q 2 , 1) for gi • a e Ji or g 2 - a G J2, 
(P, q x -a, q 2 • M) (P • a, qi,q 2 ,£+ 1) for 1 < £ < k. 

Thus, for the P-component an a-transition behaves as in a usual product automaton, but for 
the q±- and ^-components we move backwards using the a-transitions in the original automata. 
This is why the resulting automaton A is non-deterministic. 

Observe that no state of the form (P, q\,q 2 , 0) with q\ e T\ or q 2 £ J- 2 has an outgoing arc to 
level zero; we must switch to level one. There are no outgoing arcs on level k, and for each tuple 
(a, P, q±, q 2 , 1) € Ex Qx Qi x Q 2 x [k—1] there exists at most one arc (P, q[, q' 2 ,£) (P-a, qi,q 2 , £')■ 
Indeed, the P • a is determined by P and the letter a, and the triple {qi,q 2 ,£ r ) is determined by 
(qi,q 2 ,£) and the letter a. Not all such arcs exist in A, because (P,q[,q' 2 ,£) might be a bridge 
whereas (P-a, qi,q 2 ,£') is not. (Observe however that if (P-a, gi, 52, is a bridge, then (P, q[ , g 2 , £) 
is a bridge, too.) 

The set of initial states X contains all bridges at level zero of the form (Qo, q[, q' 2 , 0) with 
Qo = (<Zoij 902)- The set of final states T is given by all bridges (P, q±, q 2 , k) at level k. 

This concludes the definition of the NFA A. For an example and a graphical presentation of 
the NFA, see Figure [2] 

Remark 3.6. By Lemma 13 . 51 the NFA A can be computed by a single-valued non-deterministic 
log-space transduction. Thus, we have direct access to A and henceforth we assume that A is also 
written on the input tape. 

The next result shows the unambiguity of paths in the automaton A. It is a crucial property. 

Lemma 3.7. Let w € S* be the label of a path in A from a bridge A = (P,pi 1 p 2l £) to A' = 
(P' ,Pi,p' 2 , £'), then the path is unique. This means that B = B' whenever w — uv and 

A^B^A', A B' A 1 . 

Proof. It is enough to consider u = a G S. Let B = (Q, q%, g 2 , to). Then we have Q = P ■ a and 
qi = p'i ■ v. If £ — and pi £ Ti for i — 1,2, then to = 0, too; otherwise m = £ + 1. Thus, B is 
determined by A, A' , and u, v. We conclude B = B' . □ 
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Figure 2: DFAs for Li and L2 and the resulting NFA A with 4 initial states and 5 final states 
associated to the (linear context-free) hairpin completion Hk{L\, L2) — a + ba + U{a s ba t s > t > 1} 
with fc = 1. 



We will now show that the automaton A encodes the hairpin completion in a natural way. For 
languages U and V we define the language V u as follows: 

V u = {uvu I u G U, v e V} . 

Clearly, if U and V are regular, then V u is linear context-free, but not regular, in general. (The 
notation V u is adopted from group theory where exponentiation denotes conjugation and the 
canonical involution refers to taking inverses.) 

Lemma 3.8. For each pair t — (I,F) G I x T with F — ((di, cfe), ei, e<z, k) let R T be the 
(regular) set of words which label a path from the initial bridge I to the final bridge F, and let 
B T = B(di,d2,ex,e 2 ). 

The hairpin completion "Hfc(£i, £2) is a disjoint union 



W fc (L 1 ,L 2 )= (J B^. 



t£2xJ 

Moreover, for each word w G B^ T there is a unique factorization w = pf3p with p G R T and 
ft G B T . 

Proof. Let w G Hfe(Li, ^2)- There exists some factorization w = jafJaj such that \a\ — k and 
there are runs as in Figure [3] in the original DFAs Ai and A2 where f[ G J-\ or G T2 (or both): 



Choosing among all these runs the length |7| to be minimal, we see that we actually find the 
following picture according to Figure HJ In other words, either jafta is the longest prefix of w 
belonging to L\ or afiorj is the longest suffix of w belonging to L%, or both. The difference to 
the precedent figure is is that between fi and q[ (i = 1,2) we never enter a final state. By the 
definition of the NFA A we see that p — is the unique prefix of w such that w — pf3p with 



G 



Figure 3: Some run denned by w G Hk(Li, L 2 ) 



ii : <?oi — > ci — > rfi — > ei — > /i => g l7 
^2 : 902 — > c 2 — > d 2 — > e 2 — ► J 2 9 2 

Figure 4: The unique run defined by w € r Hk{Li,L 2 ) with |t"| minimal 

p E R T and ft € B T for some r. Now, as the length I7I is fixed by w, we see that all states c,, 
^ij 6i, /j, and are uniquely defined by w for i = 1,2. Thus, there is a unique t £ I x J 7 with 
w 6 £>^ T . More precisely, we have: 

t = (((301,902), 0^,^,0), ((di,d 2 ),ei,e 2 ,fc)). 

□ 



3.4 First Tests 

By construction, the automaton A accepts the union of the languages R T as defined in Lemma I3~8l 
If the accepted language is finite then all R T are finite and hence all B^ T are regular. This leads 
immediately to the following result: 

Proposition 3.9. It can be decided in NL whether or not the accepted language of the NFA A is 
finite. If the accepted language is finite, then the hairpin completion Hk(Li, i 2 ) is regular. 

Proof. To see that the accepted language is infinite it is enough to guess a path from an initial 
state to final one which uses some (guessed) state at least twice. Since NL is closed under 
complementation the finiteness test is possible in NL, too. The second assertion follows from 
Lemma 13.81 □ 

We check this property (although strictly speaking Test is redundant): 

Test 0: Decide in NL whether or not L(A) is finite. If it is finite, then stop with the output that 
1-Lk{Li, L 2 ) is regular. 

For convenience we may assume in the following that A accepts an infinite language and that 
all states are reachable from an initial bridge and lead to some final bridge. 

For sake of completeness let us state another result which shows that deciding regularity of the 
one-sided hairpin completion is somewhat easier, because the finiteness condition is also necessary 
in this case. However, as we neither use this result nor does it change the NL-completeness of the 
problem, we leave the proof of Proposition 13.101 to the interested reader. 



Proposition 3.10. If L\ or L 2 is finite, but the accepted language of A is infinite, then the 
hairpin completion Hfe(Li,L 2 ) is not regular. 



Let K be the set of non-trivial strongly connected components of the automaton A (read as a 
directed graph). Every non-trivial strongly connected component is on level and, moreover, as 
A accepts an infinite language, there is at least one. For n £ K let N K be the number of states in 
the component k. We have N K = \n\ < n 4 . 

The next lemma tells us that for a regular hairpin completion "Hk[L\, L 2 ) every strongly con- 
nected component k £ K is a simple cycle. 

Lemma 3.11. Let the hairpin completion JikiLi, L2) be regular, A — % A be a path in a strongly 
connected component k with 1 < \va\ < N K , and let A F be a path in A from A to a final 
bridge F. Then the word w is a prefix of some word in v\. 

In addition, the word va is uniquely defined by the conditions A — ^» A and 1 < \v\ A < N K . 
The loop A — ^> A visits every other state fi6c exactly once. Thus it builds a Hamiltonian cycle 
of k and \va\ = N K . 

Proof. Let A — ^> A be some non-trivial loop. We see that A is on level zero. Consider a path 
labeled by w from A to a final bridge F = ((pi, pz), qi, ?2j k). By assumption, all states in A are 
reachable from some initial state. Thus, we find a word u such that the automaton A accepts uv l w 
for all i > 0. We see next that uv l w(3wv' l u G Hk(Li, L2) for all z > and all /3 G B(pi,p2,qi,q 2 ). 
As Hk(Li,L2) is regular, there are s,f £ N with uv s w(3wv s+t u £ Hk(Li,L2) and t > \w[3\, by 
pumping. This means that the hairpin completion is forced to use a suffix in L2, because the 
longest prefix belonging to L\ is too short to create the hairpin completion. Due to the definition 
of A we conclude that uv s w must be a prefix of uv s+t w. This implies that w is a prefix of v and 
thus the first statement of our lemma. 

Let va be some shortest word such that A — > A. Observe first that \va\ < N K . Now, let 

A ^ B e k and A B A. For some i,j > we have \v A \ = \(v'v") j \. Thus, v\ = (v'v"y 
by the first statement. By the unique-path-property stated in Lemma 13.71 we obtain that the loop 

A ' — ► A just uses the shortest loop A — A several times. In particular, B is on the shortest 
loop around A. This yields \va\ > N K and hence the second statement. □ 

Example 3.12. In the example given in Figure [5]the state (Qcb^^jO) forms the only strongly 
connected component and the corresponding path is labeled with a. As one can easily observe the 
automaton A satisfies the properties stated in Lemma 13.111 (even though the hairpin completion 
is not regular). 

Due to the technique of single- valued non-deterministic log-space transductions we may assume 
that the set of non-trivial strongly connected components K is part of the input. Moreover, for 
each state A and k S K we know whether or not A £ k, and we know N K = \k\. 

The next test tries to falsify the property of Lemma 13.111 Hence it gives a sufficient condition 
that Hk(Li, L2) is not regular. 

Test 1: Guess some state A and k G K with A 6 k, a letter a e S, and a position 1 < m < N K 
such that: 

1. ) There is a path A — % A where m < \v\ < N K and v{m) = a. 

2. ) There is a path A — > F where w(i ■ \v\ + m) ^ a for some i S N with 1 < i ■ \v\ + m < \w\. 
If such a triple (A,a,m) exists, then output that %k{Li, L2) is not regular. 

The correctness of Test 1 follows by Lemma [3.111 and, because for the existence of paths 1.) 
and 2.) we only have to remember the triple {A, a, m), Test 1 can be performed in NL. 



8 



Remark 3.13. We can perform Test 1 in NL and in case it yields that the hairpin completion 
Hfc(Li, L 2 ) is not regular, we can stop. Henceforth, we assume that the algorithm did not stop 
during Test 1 and that every strongly connected component n £ K is a simple cycle. Performing 
another single- valued non-deterministic log-space transduction we may assume that for each A E n 
the word va is attached to A and each va is part of the input. 

3.5 Second and Third Test 

We fix a bridge A = {{pi,p 2 ),qi,q 2 ) in a strongly connected component. We let v = va as 
defined in Lemma 13.111 and let a be the prefix of length k of some long enough word in v + . (By 
Remark 13.131 the word v is written in plain form on the input tape.) By u we denote some word 
leading from an initial bridge to A. (The NL algorithm does not know u, but it knows that it 
exists.) The main idea is to investigate runs through the DFAs for L\ and L 2 where s,t > n 
according to Figure [5] Recall that n refers to the original input size, thus n > \ Qi\ for i = 1, 2. 

Li : q i >Pi > Pi > Ci > di > ei =4> qi =^ qi q 1 

L 2 ■ 902 — > P2 — > P2 — > c 2 — ► d 2 — > e 2 q 2 =>■ q 2 q 2 

Figure 5: Runs through A\ and A 2 based on the loop A — > A 

We investigate the case where uv s xyav t u G Hk(Li, L 2 ) for all s > t and where (by symmetry) 
this property is due to the longest prefix belonging to L\ (hence ei S 

The following lemma is rather technical. The notations are however chosen to fit exactly to 
Figure [5] 

Lemma 3.14. Let x,y £ S* 6e words and (di,d 2 ) G Qi x Q 2 wit/i t/ie following properties: 

1. ) a < x and x < va. 

2. ) y £ B{c\,c 2 ,di,d 2 ), where c\ — p\ ■ x and c 2 = p 2 ■ a, and x is the longest common prefix of 

xy and va. 

3. ) e\ = d\ ■ a £ T\ is a final state, q\ = e\ ■ v n , and during the computation of ei ■ v n we do not 

enter a final state in T\. 

4-) e 2 — d 2 ■ x and q 2 — e 2 ■ v n . Moreover, during the computation of e 2 ■ v n we do not enter a 
final state in T 2 (but e 2 £ T 2 is possible). 

If 7ik(Li, L 2 ) is regular, then there exists a factorization xyav — fi5/35jl where \S\ — k and 
p 2 ■ [J.5f35 G T 2 (which implies S(35jiv*u C L 2 ). 

Proof. The conditions imply that uv s xyav t u £ %k{L\,L 2 ) for all s > t > n. Moreover, by 3.) 
the hairpin completion can be achieved with a prefix in L\ and the longest prefix of uv s xyav t u 
belonging to L\ is uv s xya. 

If T-lk{Lx, L 2 ) is regular, then we have uv s xyav s+1 u £ Hk(L\, L 2 ), too, as soon as s is large 
enough, by a simple pumping argument. For this hairpin completion we must use a suffix belonging 
to L 2 . For y = 1 this follows from x < va. For 1/ ^ 1 we use x < va and additionally that the 
word xa with a = y(l) is not a prefix of va. 

By 4.) the longest suffix of uv s xyav s+1 u belonging to L 2 is a suffix of xyav s+1 u. Thus, we 
can write 

uv s xyav s+l u = uv s xyavv s u = uv s fiS 0Sjiv s u 
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where 805~pv s u G Li and |<J| = k. We obtain xyav = fidfSSjl. 

(Recall that our second DFA Ai accepts L 2 .) Hence, as P2 = Qo2 • u and P2 = Vi ' w ? we see 
that P2 ■ /i<5/3S G J^. 

We conclude as desired: if Hk(Li, L 2 ) is regular, then p 2 ■ /^<5/35 € J-jj- d 

Example 3.15. Let us take a look at Figure [2] again. Let A = (Qo, ti, *2) 0), t; = a and it = 1. If 
we choose x — a, y — b and (dj., ^2) = (pi,P2) we can see, that conditions 1.) to 4.) of Lemma l3.14l 
are satisfied but there is no factorization abaci = jiSpSji with \5\ = k such that 5f3Sjlu G L 2 . Hence, 
the hairpin completion is not regular. 

The next lemma yields another sufficient condition that Hk(Li, L2) is not regular. 

Lemma 3.16. The existence of words ijeE' and states (d±,d2) G Qi x Q2 satisfying 1.) to 
4.) of Lemma \3.1Ji\ but where for all factorizations xyav = fiSfJSjl we have P2 ■ T2 can be 

decided in NL. 

Proof. It is enough to perform either Test 2 or 3 below (non-deterministically chosen) and to 
prove the NL performance of these tests. The tests distinguish whether the word y is empty or 
non-empty. 

Test 2: Decide the existence of a word and states (dijdg) € Qi X Q2 satisfying 1.) to 4-) 

of Lemma \3.14\ with y = 1, but where for all factorizations xav = (J.S/35J1 we have P2 ■ p:S/3S £ J-%. 
If we find such a situation, then output that T-Lk^Li, L2) is not regular. 

Test 3: Decide the existence of words with y ^ 1 and states (di , d2) € Qi X Q2 satisfying 

1.) to 4-) of Lemma \3.14\ but where for all factorizations xyav = fiSfJSjl we have P2 ■ fJ,S/36 £ J-?.. 
If we find such a situation, then output that T~Lk(Li, L2) is not regular. 

The correctness of both tests follows by Lemma [JOH and they can be performed as follows: For 
both tests we guess the length of a word x which satisfies 1.) and which is therefore a prefix of va. 
Thus we can remember x, because va is available by the input. We guess states (di, d2) €E Qi x Q2, 
and verify that conditions 3.) and 4.) hold, which is easy because we can reconstruct x. For Test 2 
we check that p± ■ x = d\ and P2 ■ a = d2- Then we have to test whether for all factorizations 
xav — fi5f35jl with \S\ = k the condition P2 ■ (iSf36 T2 holds. This can easily be done in NL 
because we have full access to the word xav. 

Test 3 is a bit more tricky. We guess a £ X and we check that xa is not a prefix of va. 
We have to verify that a path from c\ to d\ exists which is labelled by some non-empty word 
y G a£* and that a path from C2 to d2 exists which is labelled by y. This can be achieved by a 
graph reachability algorithm which uses forward edges in the DFA of L\ and simultaneously uses 
backwards edges in the DFA of L 2 . Now, in a factorization xyav = fi5/3S]l we cannot have that 
a; is a proper prefix of fiS otherwise xa would be a prefix of va. But this was excluded by the 
choice of a. Thus, fid is a prefix of x and Sjl is a suffix of x. This means, to ensure that there is 
no factorization with P2 ■ fiSf3S G J-2, we do not need to remember the word y. We just compute 
d2 • x and during this computation we validate that there are no final states in J-2 after k or more 
steps. □ 

We claim that, if all three tests did not yield that the hairpin completion Hk(Li, L2) is not 
regular, then the hairpin completion is indeed regular. This will complete the proof of Theorem l3.ll 

Lemma 3.17. Suppose no outcome of Tests 1, 2, and 3 is "not regular". Then the hairpin 
completion Hk(L\, L2) is regular. 



10 



Figure 6: Runs through A\ and A 2 for the word tt. We assume /1 G T\. 



Proof. Let 7r £ Hk(Lx, L 2 ). Write 7r = 70/357 with I7I minimal such that either ja/3a G £1 
or a/357 G -^2- By symmetry we assume 7a/?5 € L\, We may also assume that (7) > 2n 4 (cf. 
Proposition 13.91 and Test 0). We can factorize 7 = uvw with \uv\ < n 4 and 1 < \v\ < \w\ such that 
there are runs as in Figure [6l 

We infer from Test 1 that wa is a prefix of some word in v + . We may assume that w G v + by 
adjusting the choices of u, v, and w. (Possibly, u gets longer but it is still shorter than n 4 , v is 
transposed, and w gets shorter.) 

Hence, we can write wa/3 — v m xy with m > such that v m x is the maximal common prefix 
of wa/3 and some word in v + with a < x < va. 

We see that for some s > t > we can write 

7r = uv s xyav t u. 

Moreover, uv s xyav t u £ Hk(Li, L 2 ) for all s > t > 0. There are only finitely many choices for 
u, v,x (due to the lengths bounds) and for each of them there is a regular set R y associated to the 
finite collection of bridges such that 

tt e {uv^Rymf-u I s > t > 0} C Hk(L 1 ,L 2 ). 

More precisely, we can choose Ry = {1} for y = 1, and otherwise we can choose 

Ry € {B(c%, C2, tii, ^2) H aE* (ci, C2, tii, c?2) is a bridge and a G £} . 

Note that the sets {uv s xR y av t u | s > t > 0} are not regular, in general. If we bound however 
the exponent t by n, then the finite union 

I^J [uv s xR y av t u \ s >t] 

0<t<n 

becomes regular. Thus, we may assume that t > n. Let e 2 = P2 ■ oqjx. We have e 2 • v n = q 2 and, 
if there is a final state during the computation of e 2 • v n , then for all t > s > n and y G i? y we 
have that uv s xyav t u G Hk(Li, L 2 ), due to a suffix in L2, and uv n v + xR y av + v n u C T-C^^Li, L 2 ). 

Otherwise Test 2 or 3 tells us that for all y G R y the word xyav has a factorization /j.Si'Sjl 
such that |t)| = k and P2 ■ [iSuS G J-2. The paths qo 2 ■ u = p 2 and p 2 • v = p 2 yield 8v5jlv*u C L 2 
and, again, uv n v + xR y av + v n u C Hk(Li, L 2 ). 

The hairpin completion Hk(Li, L 2 ) is a finite union of regular languages and hence it is regular 
itself. □ 

4 Conclusion and open problems 

We have shown that the problem to decide the regularity of hairpin completion r Hk(Li,L 2 ) for 
given regular languages L\ and L 2 is NL-complete. In particular it can be solved efficiently in 
parallel with Boolean circuits of polynomial size and poly-log depth, because NL is contained in 
Nick's Class NC 2 (see e.g. QH Thm. 16.1]). 
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Our NL- result is based on the fact that the input is given by DFAs accepting L\ and L^. It is 
open, what happens if the input is given in a more concise form, say the input is given by NFAs 
accepting L\ and L2 (or L2). 

Another result of j4] says that the time complexity of the same problem is in 0(n 8 ). The full 
proof of this fact is quite involved, and it employs different ideas. It will appear elsewhere. It 
is open whether the C(n 8 ) time bound is optimal. A further improvement on this time bound 
seems however to ask for quite different ideas. So far, the best algorithm known (to us) considers 
all pairs of states in the automaton A. There are f2(n 8 ) pairs and it is unclear how to avoid this 
bound. 

There is also a very natural variant of hairpin completion which was introduced in [5] . It has 
been called partial hairpin completion and further investigated in [14] . where the operation has 
been called hairpin lengthening. The partial hairpin completion of L\ and L2 is given by the set 
of words 701/357', where 7' is a prefix 7 and jaf3a € L± or 7 is a prefix 7' and af3a r y' 6 L%. 

Again, the partial hairpin completion of a regular language is linear context-free, but not 
regular, in general. It is open whether regularity of the partial hairpin completion of regular 
languages is decidable. 
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